313 research outputs found

    Do System Test Cases Grow Old?

    Full text link
    Companies increasingly use either manual or automated system testing to ensure the quality of their software products. As a system evolves and is extended with new features the test suite also typically grows as new test cases are added. To ensure software quality throughout this process the test suite is continously executed, often on a daily basis. It seems likely that newly added tests would be more likely to fail than older tests but this has not been investigated in any detail on large-scale, industrial software systems. Also it is not clear which methods should be used to conduct such an analysis. This paper proposes three main concepts that can be used to investigate aging effects in the use and failure behavior of system test cases: test case activation curves, test case hazard curves, and test case half-life. To evaluate these concepts and the type of analysis they enable we apply them on an industrial software system containing more than one million lines of code. The data sets comes from a total of 1,620 system test cases executed a total of more than half a million times over a time period of two and a half years. For the investigated system we find that system test cases stay active as they age but really do grow old; they go through an infant mortality phase with higher failure rates which then decline over time. The test case half-life is between 5 to 12 months for the two studied data sets.Comment: Updated with nicer figs without border around the

    Finding a boundary between valid and invalid regions of the input space

    Full text link
    In the context of robustness testing, the boundary between the valid and invalid regions of the input space can be an interesting source of erroneous inputs. Knowing where a specific software under test (SUT) has a boundary is essential for validation in relation to requirements. However, finding where a SUT actually implements the boundary is a non-trivial problem that has not gotten much attention. This paper proposes a method of finding the boundary between the valid and invalid regions of the input space. The proposed method consists of two steps. First, test data generators, directed by a search algorithm to maximise distance to known, valid test cases, generate valid test cases that are closer to the boundary. Second, these valid test cases undergo mutations to try to push them over the boundary and into the invalid part of the input space. This results in a pair of test sets, one consisting of test cases on the valid side of the boundary and a matched set on the outer side, with only a small distance between the two sets. The method is evaluated on a number of examples from the standard library of a modern programming language. We propose a method of determining the boundary between valid and invalid regions of the input space and apply it on a SUT that has a non-contiguous valid region of the input space. From the small distance between the developed pairs of test sets, and the fact that one test set contains valid test cases and the other invalid test cases, we conclude that the pair of test sets described the boundary between the valid and invalid regions of that input space. Differences of behaviour can be observed between different distances and sets of mutation operators, but all show that the method is able to identify the boundary between the valid and invalid regions of the input space. This is an important step towards more automated robustness testing.Comment: 10 pages, conferenc

    Searching for test data with feature diversity

    Full text link
    There is an implicit assumption in software testing that more diverse and varied test data is needed for effective testing and to achieve different types and levels of coverage. Generic approaches based on information theory to measure and thus, implicitly, to create diverse data have also been proposed. However, if the tester is able to identify features of the test data that are important for the particular domain or context in which the testing is being performed, the use of generic diversity measures such as this may not be sufficient nor efficient for creating test inputs that show diversity in terms of these features. Here we investigate different approaches to find data that are diverse according to a specific set of features, such as length, depth of recursion etc. Even though these features will be less general than measures based on information theory, their use may provide a tester with more direct control over the type of diversity that is present in the test data. Our experiments are carried out in the context of a general test data generation framework that can generate both numerical and highly structured data. We compare random sampling for feature-diversity to different approaches based on search and find a hill climbing search to be efficient. The experiments highlight many trade-offs that needs to be taken into account when searching for diversity. We argue that recurrent test data generation motivates building statistical models that can then help to more quickly achieve feature diversity.Comment: This version was submitted on April 14th 201

    Towards Automated Boundary Value Testing with Program Derivatives and Search

    Full text link
    A natural and often used strategy when testing software is to use input values at boundaries, i.e. where behavior is expected to change the most, an approach often called boundary value testing or analysis (BVA). Even though this has been a key testing idea for long it has been hard to clearly define and formalize. Consequently, it has also been hard to automate. In this research note we propose one such formalization of BVA by, in a similar way as to how the derivative of a function is defined in mathematics, considering (software) program derivatives. Critical to our definition is the notion of distance between inputs and outputs which we can formalize and then quantify based on ideas from Information theory. However, for our (black-box) approach to be practical one must search for test inputs with specific properties. Coupling it with search-based software engineering is thus required and we discuss how program derivatives can be used as and within fitness functions. This brief note does not allow a deeper, empirical investigation but we use a simple illustrative example throughout to introduce the main ideas. By combining program derivatives with search, we thus propose a practical as well as theoretically interesting technique for automated boundary value (analysis and) testing

    Maintenance of Automated Test Suites in Industry: An Empirical study on Visual GUI Testing

    Full text link
    Context: Verification and validation (V&V) activities make up 20 to 50 percent of the total development costs of a software system in practice. Test automation is proposed to lower these V&V costs but available research only provides limited empirical data from industrial practice about the maintenance costs of automated tests and what factors affect these costs. In particular, these costs and factors are unknown for automated GUI-based testing. Objective: This paper addresses this lack of knowledge through analysis of the costs and factors associated with the maintenance of automated GUI-based tests in industrial practice. Method: An empirical study at two companies, Siemens and Saab, is reported where interviews about, and empirical work with, Visual GUI Testing is performed to acquire data about the technique's maintenance costs and feasibility. Results: 13 factors are observed that affect maintenance, e.g. tester knowledge/experience and test case complexity. Further, statistical analysis shows that developing new test scripts is costlier than maintenance but also that frequent maintenance is less costly than infrequent, big bang maintenance. In addition a cost model, based on previous work, is presented that estimates the time to positive return on investment (ROI) of test automation compared to manual testing. Conclusions: It is concluded that test automation can lower overall software development costs of a project whilst also having positive effects on software quality. However, maintenance costs can still be considerable and the less time a company currently spends on manual testing, the more time is required before positive, economic, ROI is reached after automation
    corecore